Algebraic Algorithms for Sampling from Conditional Distributions

نویسندگان

  • Persi Diaconis
  • Bernd Sturmfels
چکیده

We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufticient statistic. Examples include contingency tables, logistic regression, and spectral analysis of permutation data. The algorithms involve computations in polynomial rings using Grobner bases. 1. Introduction. This paper describes new algorithms for sampling from the conditional distribution, given a sufficient statistic, for discrete exponential families. Such distributions arise in carrying out versions of Fisher's exact test for independence and goodness of fit. They also arise in constructing uniformly most powerful tests and accurate confidence intervals via Rao-Blackwellization. These and other applications are described in Section 2. As shown below, the new algorithms are a useful supplement to traditional asymptotic theory, which is useful for large data sets, and exact enumeration, which is useful for very small data sets. The following example should motivate the general construction. Table 1 shows data gathered to test the hypothesis of association between birthday and deathday [Andrews and Herzberg (1985), page 4291. The table records the month of birth and death for 82 descendants of Queen Victoria. A widely stated claim is that birthday-deathday pairs are associated. The usual X" test for independence is 115.6 on 121 degrees of freedom, suggesting no association. The classical rules of thumb for validity of the chi-square approximation (minimum 5 per cell) are badly violated here, and there are too many tables with these margins to permit exact enumeration. Figure 1 shows a probability-probability plot of the permutation distribution of the chi-square statistic versus the chi-square approximation (Xf,l). The approximation is not particularly accurate. Indeed, the permutation probability of ,y2 < 115.6 is 0.3208 versus 0.3775 for the approximation. To illustrate the present approach, consider generating a random contingency table with fixed row and column sums. Thus, fix positive integers I and AMS 1991 subject cla.ssi/ications. 6E17, 13P10. Total :F(r, c) be the set of 1 x J arrays (x,,) of nonnegative integers with the given row sums and column sums. Let be the hypergeometric distribution on Y(r, c). This is the conditional distribution of the data, given the sufficient statistics (row/column sums) for the classical model of independence. A Monte Carlo method for generating from H proceeds as follows. Let x be a table which satisfies the constraints. Modify x by choosing a pair of rows and a pair of columns a t random. These intersect in four entries and x is modified as …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multivariate-from-Univariate MCMC Sampler: The R Package MfUSampler

The R package MfUSampler provides Monte Carlo Markov Chain machinery for generating samples from multivariate probability distributions using univariate sampling algorithms such as slice sampler and adaptive rejection sampler. The multivariate wrapper performs a full cycle of univariate sampling steps, one coordinate at a time. In each step, the latest sample values obtained for other coordinat...

متن کامل

Testing Probability Distributions using Conditional Samples

We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. This is an oracle that takes as input a subset S ⊆ [N ] of the domain [N ] of the unknown probability distribution D and returns a draw from the conditional probability distribution D restricted to S. This new model allows ...

متن کامل

Bayesian Conditional Density Filtering for Big Data

We propose a Conditional Density Filtering (C-DF) algorithm for efficient online Bayesian inference. C-DF adapts Gibbs sampling to the online setting, sampling from approximations to conditional posterior distributions obtained by tracking of surrogate conditional sufficient statistics as new data arrive. This tracking eliminates the need to store or process the entire data set simultaneously. ...

متن کامل

MCMC methods to approximate conditional predictive distributions

Sampling from conditional distributions is a problem often encountered in statistics when inferences are based on conditional distributions which are not of closed-form. Several Markov chain Monte Carlo (MCMC) algorithms to simulate from them are proposed. Potential problems are pointed out and some suitable modifications are suggested.Approximations based on conditioning sets are also explored...

متن کامل

Testing equivalence between distributions using conditional samples

We study a recently introduced framework [7, 8] for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. This is an oracle that takes as input a subset S ⊆ [N ] of the domain [N ] of the unknown probability distribution D and returns a draw from the conditional probability distribution D restricted to S....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995